A General-Purpose Compression Scheme for Databases
نویسندگان
چکیده
Current adaptive compression schemes such as gzip and compress are impractical for database compression as they do not allow random-access to individual records. The sequitur scheme of Nevill-Manning and Witten also adaptively compresses data, achieving excellent compression but with signiicant main-memory requirements. A preliminary version of sequitur used a semi-static modeling approach to achieve slightly worse compression than the adaptive approach. We describe a new variant of the semi-static sequitur algorithm, ray, that reduces main-memory use and is a candidate for general-purpose compression and random-access to databases. We show that ray achieves better compression than an eecient Huumann scheme and popular adaptive compression techniques.
منابع مشابه
Numerical Simulation of Shock-Wave/Boundary/Layer Interactions in a Hypersonic Compression Corner Flow
Numerical results are presented for the shock-boundary layer interactions in a hypersonic flow over a sharp leading edge compression corner. In this study, a second- order Godunov type scheme based on solving a Generalized Riemann Problem (GRP) at each cell interface is used to solve thin shear layer approximation of laminar Navier-Stokes (N-S) equations. The calculated flow-field shows general...
متن کاملA General Compression Scheme for Databases
Compression of databases not only achieves a reduction in storage space but can reduce overall retrieval times. Current schemes such as gzip and compress are impractical for the purposes of databases as they do not allow individual records to be retrieved. A recent compression scheme, sequitur, allows quick decompression of any individual section of the database, however it uses extravagant amo...
متن کاملCompression of nucleotide databases for fast searching
MOTIVATION International sequencing efforts are creating huge nucleotide databases, which are used in searching applications to locate sequences homologous to a query sequence. In such applications, it is desirable that databases are stored compactly, that sequences can be accessed independently of the order in which they were stored, and that data can be rapidly retrieved from secondary storag...
متن کاملInvestigations on Path Indexing for Graph Databases
Graph databases have become an increasingly popular choice for the management of the massive network data sets arising in many contemporary applications. We investigate the effectiveness of path indexing for accelerating query processing in graph database systems, using as an exemplar the widely used open-source Neo4j graph database. We present a novel path index design which supports efficient...
متن کاملEfficient Access of Compressed Data
In this paper a compression technique is presented which allows a high degree of compression but requires only logarithmic access time. Tne tech nique is a constant suppression scheme, and is most applicable to stable databases whose distribution of constants is fairly clustered. Further more, the repeated use of the technique permits the suppression of a multi ple number of different consta...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999